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ABSTRACT 

The design of vocabulary tests, particularly for 
English as a Second Language, is discussed. The discussion is 
intended to help language teachers with little or no knowledge of 
testing gain a tetter understanding of vocabulary testing. First, a 
set of principles for guiding the writing of vocabulary tests are 
outlined, presented in the form of questions. The principles address 
the use that will be made of test results, determination of the words 
to be tested, testing of breadth vs. depth of knowledge, and how 
students' knowledge is to be elicited. Several tests of vocabulary 
size (breadth of knowledge) are examined, and several experimental 
tests that have the potential for measuring depth of knowledge are 
discussed. Contains three references and two notes. (MSE) 
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VOCABULARY TESTING: QUES- 
TIONS FOR TEST DEVELOPMENT 
WITH SIX EXAMPLES OF TESTS OF 
VOCABULARY SIZE AND DEPTH 

By Norbert Schmitt 

Minatogawa Women's College, Japan 

Introduction 

Although there has been some interest 
shown in vocabulary testing throughout 
this century (Sims, 1929; Cronbach, 1943; 
Dale, 1965; Perkins and Linnville, 1987), the 
recent surge of attention in vocabulary 
studies (Meara, 1987; Carter and McCarthy, 
1988; Coady, 1993) has given impetus to 
several fresh testing approaches. Unfortu- 
nately, these approaches have not yet fil- 
tered down to all classroom teachers, many 
of whom seem tied to traditional ways of 
thinking of and. testing vocabulary. Al- 
though vocabulary achievement tests (tests 
which measure whether students have 
learned the words which they were taught 
in a class or course) remain largely un- 
changed, improved testing methods have 
been developed to measure vocabulary size. 
Perhaps more importantly, work is begin- 
ning on an emerging area of vocabulary 
testing - measuring how well individual 
words are learned (depth of knowledge), as 
opposed to the traditional Yes, the loord is 
known/No, it is not knoivn dichotomy. This 
paper aims to help teachers with little or no 
testing background improve their under- 
standing of vocabulary testing. It will at- 
tempt to do this by first proposing a set of 
principles, in the form of questions, which 
may prove useful in guiding the writing of 
better vocabulary tests. Next, several tests 
of vocabulary size will be examined. Fi- 
nally, several experimental tests which have 
potential for measuring learners' depth of 
knowledge will be discussed. A major 
theme that u 11 run throughout the paper is 
that teachers can write better vocabulary 
tests if they have ; clearer understanding of 
precisely what aspects of word knowledge 
they wish to test. 



Four Questions For Developing A Vo- 
cabulary Test 

1. WHY DO YOU WANT TO TEST? 
This question could be rephrased as "What 
use will you make of the resulting test 
scores?" There are several possible pur- 
poses for giving a vocabulary test. Perhaps 
the most common one is to find out if stu- 
dents have learned the words which were 
taught, or which they were expected to learn 
(achievement test). Alternatively, a teacher 
may want to find where their students' vo- 
cabularies have gaps, so that specific atten- 
tion can be given to those areas (diagnostic 
test). Vocabulary tests can also be used to 
help place students in the proper class level 
(placement test). Vocabulary tests which 
are part of commercial proficiency tests, 
such as the TOEFL (Educational Testing 
Service, 1987), attempt to provide a meas- 
ure of a learner's vocabulary size, which is 
believed to give an indication of overall 
language proficiency. Other possibilities 
include utilizing tests as a means to moti- 
vate students to study, to show students 
their progress in learning new words, and 
to make selected words more salient by 
including them on a test. Having a clear 
idea of which of these purposes the test will 
be used for can lead to more principled 
answers to the following questions. 

2. WHAT WORDS DO YOU WANT TO 
TEST? 

If the teacher wants to test the students' 
class achievement, then the words tested 
should obviously be drawn from the ones 
covered in class. It is better to avoid stand- 
ardized tests in this case, because unless an 
instructor teaches solely from a single book, 
any general-purpose test is unlikely to be as 
suitable to a particular classroom and set of 
students as one the instructor could cus- 
tom-make (Heaton, 1988). The teacher is in 
the best position to know her students and 
which words they should have mastered. 
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Vocabulary tests used for placement or di- 
agnostic purposes may need to sample from 
a more general range of words (Heaton, 
1988). If the students to be tested all come 
from the same school, or have been taught 
from similar syllabi, then it is possible to 
draw words from those taught in their 
courses. However, if students come from 
different schools with different syllabi and 
language teaching methodologies, as may 
be the case in a university placement situa- 
tion, then the words must be more broadly 
based. In these cases, words are often taken 
from word frequency lists. These lists were 
created by countinghow frequently various 
words appeared in a very large collection of 
written texts (Thorndike and Lorge, 1944; 
West, 1953; Kucera and Francis, 1967). Since 
students can generally be expected to know 
more frequent words best, regardless of 
their previous schooling, use of these lists 
allow the principled selection of target words 
which can be adjusted for students' antici- 
pated language level. The results from tests 
based on these lists can supply information 
not only about how many words are known, 
but also at what frequency level. Tests 
based on word frequency lists can also be 
used both within a school system. 

Vocabulary tests which are part of profi- 
ciency tests need to include the broadest 
range of words of all. Many universities 
rely on commercial proficiency tests to con- 
trol admissions. Therefore, the tests must 
include a range of words which will pro- 
vide a fair evaluation of people of different 
nationalities, native languages, and cultures, 
as well as proficiency levels. Some of the 
words on these tests must be uncommon 
enough to differentiate between higher level 
test takers. 

3. WHAT ASPECTS OFTHESE WORDS 
DO YOU WANT TO TEST? 
After the words to be tested have been cho- 
sen, the next step is to decide which aspects 
of those words will be tested. Perhaps the 
first decision to be made is whether to 
measure the size of a student's vocabulary 



(breadth of knowledge) or test how well he 
knows individual words (depth of knowl- 
edge). Until recently, almost all vocabulary 
tests measured vocabulary size. The 
vocabulary components of many com- 
mercial tests attempt to give an indication 
of the overall vocabulary size of the testees. 
In the classroom, vocabulary achievement 
tests urually try to measure how many 
words students know from the subset of 
words they studied. Placement and 
diagnostic tests have also commonly 
measured vocabulary size. If teachers are 
interested in finding out how many words 
their students know, they will probably de- 
cide to test only the conceptual meaning of 
words, since vocabulary size tests have 
traditionally measured only that aspect of 
word knowledge. 

However, Nation (1990) has pointed out 
that a person must know more than just a 
word's meaning in order to use it fluently. 
He lists eight kinds of native-speaker word 
knowledge: knowledge of a word's mean- 
ing, spoken form, written form, grammati- 
cal patterns (part-of-speech and derivative 
forms), collocations (other words which 
naturally occur together with the target word 
in text), frequency, associations (the mean- 
ing relationships of words ie. diamond - hard, 
jewelry, weddings), and stylistic restrictions 
(such as levels of formality and regional 
variation). Viewing vocabulary from this 
perspective, traditional meaning-based 
know/don't know tests are inadequate for 
measuring vocabulary knowledge. Depth 
of knowledge tests are needed which meas- 
ure some of these components of word 
knowledge, as well as how fluently they can 
be put into use. Reflection on the various 
types of word knowledge can help a teacher 
decide more precisely which of'thoi.;e as- 
pects she wants to measure and which iest 
formats are the most suitable for that 
purpose. For example, if she believes that 
collocational knowledge is important, she 
would want to use a test format which can 
capture that kind of knowledge, such as the 
Multiple True/ False test discussed in the 
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last section of this paper. Also, as the nature 
of vocabulary acquisition is incremental, 
tests which consider word knowledge can 
allow students to demonstrate the 
components they possess at a given time, 
even if they are not in full control of every 



one. 



Another:.mportant consideration is whether 
the words will be tested receptively or pro- 
ductively. A lthough this distinction is more 
of a continuum than a dichotomy, most test 
formats fit more easily into one category or 
othtr. Examples of predominately receptive 
test formats are multiple-choice, true / false, 
and matching, while tests requiring LI 
translations, L2 synonyms or definitions, 
and fill-in-the-blank are examples of 
productive tests. When should each be 
used? There are no hard and fast rules, but 
if a teacher is mainly interested in having his 
students recognize target words when 
reading, then a receptive test is suitable. If 
students are expected to be able to use the 
target words in their writing, then a 
productive test may be more appropriate. 
Also, it might be better to test newer words, 
to which the students have not yet had 
much exposure, with receptive tests, since it 
is generally considered that accurate pro- 
duction requires more control over word 
knowledge. 

The teacher should also consider the mode 
of the test. Although the vast majority of 
vocabulary tests are in the written mode, 
tests in the verbal mode are also possible; 
dictation and interviews are just two exam- 
ples. Test mode is related to another factor 
- whether the test will measure only 
vocabulary knowledge or whether it will 
measure how well vocabulary knowledge 
can be used in conjunction with other lan- 
guage skills, such as reading and writing. 
This is important because many test for- 
mats require the testee to rely heavily on 
other language skills to answer the item 
correctly. Let's look at two examples: 

1. Write a sentence illustrating the meaning 
of gather . 
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2. Listen to the tape and write down the 
word from the story that means the same as 
greedy. 

In Example 1, the student may know the 
meaning of gather, but might not be a profi- 
cient enough writer to produce a sentence 
expressing that knowledge. Example 2 
shows a task that tests listening ability as 
well as vocabulary. These kinds of test 
formats are fine if the teacher wants to 
measure the control of a word in a language 
usage context, but are less suitable if the 
teacher wants a discrete measure of whether 
the word's conceptual meaning is known or 
not. This latter case requires isolating the 
vocabulary knowledge as much as possible 
from proficiency in other language skills. 
Of course, this does not mean that vocabu- 
lary tests should be devoid of context. The 
point is that if teachers want to test mainly 
conceptual meaning, they should try to 
minimize the difficulty of the reading, writ- 
ing, speaking, and listening involved in the 
test items so that limitations in these 
language skills do not restrict students' 
ability to demonstrate their vocabulary 
knowledge. An example, of how to achieve 
this is to always use words of a higher 
frequency (more common) in the defini- 
tions and sentence /discourse context than 
the target words being tested. 

4. HOW WILL YOU ELICIT STUDENTS' 
KNOWLEDGE OF THESE WORDS? 
This question involves decisions about con- 
structing the testing instrument, based on 
the answers to the preceding questions. The 
most important decision is what kind (or 
kinds) of test format will be used. Since 
different students may have different pref- 
erences and different strengths in testing, it 
may be a good idea to create a test combin- 
ing several test formats. Heaton (1988) dis- 
cus' ,es several types of receptive and pro- 
ductive test formats. If the test is to measure 
depth of knowledge, the test format needs 
k- be carefully selected to ensure it is condu- 
cive to measuring the kinds of word knowl- 
edge to be tested. (For examples of this, see 
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the section on Depth of KnowledgeTests.) b. containing little variation 

c. abundant to some extent 
The length of the test should also be consid- d. containing monkeys and snakes 

ered. For any test, the larger the number of 

test items, the more accurate a picture it will Even if a student did not know the target 
give of students' knowledge. Consequently, word luxuriant in this admittedly extreme 
situations in which important decisions are example, she could probably guess the cor- 
made on the basis of test results would rect option a. It is longer than the other 
normally call for longer and more compre- options and has the 'feel' of a dictionary 
hensive tests. Some test formats, such as definition, having been taken directly from 
checklist and some matching formats allow Webster's Ninth New Collegiate Dictionary 
a larger number of items to be completed (1987). Distractor options b and c both focus 
within a certain time period. However, the attention on option «, while the last option is 
law of diminishing returns has to be too silly to consider. Having a colleague 
considered, as student fatigue sets in on look over a new test is a good way of catch- 
tests requiring a long period of time. It is ing such clues that the test-designer is often 
also important to ensure that the majority of too 'close' to notice. In fact, it is always a 
students can complete all of the test items good idea to have someone take the test 
within the given time period. For many beforeitisusedinordertouncoverproblems 
purposes, relatively short tests will suffice, before it is too late. 
For example, tests given for motivational 

purposes may only need to be 5-10 minutes While tests should have no obvious clues to 
long. help the test-taker guess, it is important to 

make sure there is enough context in recep- 
The best vocabulary test is one in which a tive tests to help students understand which 
student who knows a word is able to answer meaning of a word is being tested, 
the test item easily, while a student who Productive tests require even more context 
does not know the word will find it impos- to narrow the possibilities down to the word 
sible or very difficult to provide the correct the teacher wants. But it is important to 
answer. Teachers should ensure that tests remember the point already raised about 
have no misleading questions which would limitations in other language skills prevent- 
trick students who know a word, but on the ing students from exhibiting their full 
other hand, tests should not give away any knowledge of words, 
clues which would help students to guess 
unknown words. For example, Oiler (1979) 

lists the kind of clues that might give away Tests Of Vocabulary Size 
an answer in a multiple-choice test format: Since most teachers are probably aware of 
the correct choice is either the longest or several kinds of vocabulary achievement 
shortest option, the opposite of the correct tests, the next two sections will give brief 
choice is given, the alternatives repeatedly introductions to tests teachers are not likely 
refer to the information given in the correct to be familiar with. This section presents 
answer, and ridiculous alternatives are in- three tests which measure vocabulary size, 
eluded. The following example illustrates w hi] e the next section introduces three ex- 
these problems. perimental tests which attempt to measure 

the depth of a student's vocabulary knowl- 

A rain forest is a luxuriant environ- edge. 

ment. 

A frequently used method of determining 
a. abundantly and often extrava- the total size of a person's vocabulary in LI 
gantly rich and varied research studies has been dictionary 
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method tests. They involve systematically 
choosing words from a large dictionary, ie. 
the fifth word from every tenth page. These 
words are then fixed on a test. The percent- 
age of correct answers is then multiplied by 
the number of words in the dictionary to 
arrive at an estimate of vocabulary size. 
Unfortunately, this method has many prob- 
lems, highlighted by widely varying esti- 
mates of native-speaker vocabulary size. A 
serious problem is that dictionaries of dif- 
ferent sizes have been used, leading to in- 
consistent results. Also, the number of test 
items compared to the total number of pos- 
sible words (sample rate) is very low. This 
method cannot really be recommended for 
determining the total vocabulary size of L2 
learners, especially since better methods are 
available. 

One of these methods utilizes the concept 
that, in general, more frequent words are 
learned before less frequent words. Instead 
of using dictionaries which can vary in size 
as a source for test words, they are taken 
from frequency count lists. This method 
entails selecting one or more frequency lists 
and deciding on the criteria for picking 
words from the lists. The words from these 
lists are commonly split into frequency 
levels at 1,000 word intervals, although 
smaller groupings are possible. Words are ' 
systematically selected from the levels the 
testees are likely to know, such as the first 
2000 most frequent words for beginners. 
The format is one where words ->nd defini- 
tions are matched. The percentage of an- 
swers correct in each level's section is mul- 
tiplied by the total number of words in that 
level. The scores from all applicable levels 
tests can be added together to arrive at a 
total vocabulary score. The obvious advan- 
tage of this method is that information is 
available about how many words learners 
know at each level. As such, it has even 
greater applications as a placement or diag- 
nostic test than a test of total vocabulary 
size. Another major advantage is that these 
tests are available. The original Vocabulary 
Levels Test appears in Nation (1990), and a 



revised version with four different forms 
per level isnowbeingtested for validity and 
equivalence (Schmitt and Nation, in prepa- 
ration). 

A variation of the same concept features a 
completely different test format. Checklist 
tests use the same procedure in selecting the 
words to be tested, but the learners are only 
required to 'check' if they know a word or 
not. This kind of test means that learners 
can cover many more words than in tests 
with other item formats, and achieves a 
much better sampling rate. The obvious 
problem is that many subjects might over- 
estimate their vocabulary knowledge and 
check words they really do not know. To 
compensate for this, nonwords which look 
like real words but are not, such as flindex or 
trebron, are put into the test along with the 
real words. If some of these nonwords are 
'checked' that indicates that the student is 
overestimating his vocabulary knowledge. 
A formula compensates for this overestima- 
tion to give more accurate scores. The com- 
pensation formula works well if the stu- 
dents are careful and mark only a few 
nonwords,but if they mark very many, then 
their scores are severely penalized and the 
test becomes unreliable. (For more on this 
method, see Meara and Buxton, 1987). There 
is a book of these checklist tests available, 
which includes a scoring table, called the 
EFL Vocabulary Tests 1 (Meara, 1992). There 
is also a commercial computerized version 
of this test available, the Eurocentres Vo- 
cabulary Size Test 2 (EVST) (Eurocentres, 
1990) which requires about nine minutes 
per student to complete. As with the Vo- 
cabulary Levels Test, either of these tests 
would be particularly suitable as a place- 
ment test. 



Depth Of Knowledge Tests 

Since the area of testing for depth of vocabu- 
lary knowledge is so new, there are not yet 
many depth tests to examine. In fact, in a 
recent manuscript, Wesche and Paribakht 
(in preparation) found only one other depth 
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test to compare with their own. Their ex- 
perimental test, the Vocabulary Knowl- 
edge Scale (VKS), has students rate how 
well they know a word on the following 
scale: 

I. I don't remember having seen this 
word before. 

II. I have seen this word before, but I 
don't know what it means. 

LI. I have seen this word before, and I 
think it means 

. (synonym or 

translation) 

IV. I know this word. It means 

. (synonym or 

translation) 

V. I can use this word in a sentence: 

.(if you do 

this section, please also do Section 
IV.) 

(Wesche and Paribakht, in preparation) 

This test combines student self- reports, with 
production to ensure that students do know 
the words. This kind of test can give a 
teacher some indication of where along the 
acquisition continuum a word exists in a 
student's lexicon. In addition, because it 
emphasizes what students know, rather than 
what they don't know, by allowing them to 
show their partial knowledge of a word, it 
may be more motivating than other types of 
tests. But this test has several weaknesses 
that need to be addressed. One is that we 
cannot assume that a word is fully learned 
from just one synonym or sentence. An- 
other is that receptive knowledge is only 
tested in the first two steps. Also, the number 
of words that can be covered by the such a 
test format is rather limited. Most impor- 
tantly, the best way to score this test is not 
yet clear. 

Another test which attempts to measure 



how well learners know a word isThe Word 
Associates Test being developed by Read 
(in preparation). This test has the potential 
to measure associative and collocational 
word knowledge, in addition to conceptual 
knowledge. In it, the target word is fol- 
lowed by eight other words, four of which 
have some relationship v> ith the target word 
and four which don't. Jits ielated words 
can be synonyms or words similar in mean- 
ing (edit - revise), collocates or words which 
often occur together (edit - film), or words 
which have some analytical component re- 
lationship (electron - tiny). Learners are 
asked to circle the words which are related. 

edit 

arithmetic film pole pub- 
lishing 

revise risk surface text 

(Read, 1993) 

The scoring system for this test is yet to be 
worked out, but must eventually take ac- 
count of the number of correct association 
words picked, a* well as compensate for the 
number of incorrect distractors circled. Also, 
since L2 associations are rather unstable 
(Meara, 1984), this test might be more suit- 
able for more advanced learners. 

Cronbach (1 943) suggests a test format which 
aims to provide a more precise measure- 
ment of word meaning. His Multiple True/ 
False Test asks several true / false questions 
about the same word. The following exam- 
ples combine Cronbach's testing idea with 
some of Nation's (1990) categories of word 
knowledge. Although this test was created 
for this paper and has not been validated, it 
illustrates an approach tobe explored which 
may prove useful in measuring depth of 
vocabulary knowledge. 



Check each acceptable definition or use of 
the following words. 
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run 

to move with quick steps 

a run in your hair 

to run in a race 

a river runs 

to run down a debt by paying it 

to run a business 

a run in a nylon stocking 

to score a run in football 

tap 

to tap a telephone 

a gentle knock 

to embarrass someone 

a tap on a sink 

to hit strongly 

a tap on a car tire 

to tap one's fingers 

a tap on a beer keg 

This test has the potential to address the 
polysemous meanings of a word, as well as 
offering possible collocations for students 
to consider the correctness of. Items can be 
written to capture associative relationships, 
such as those in the Word Associates Test, 
or stylistic aspects if they are applicable to a 
word. However, as in the other tests, there 
are issues to be worked out. The scoring 
presents problems, although having stu- 
dents answer Y if they are certain of a posi- 
tive answer, N if they are certain of a nega- 
tive answer, and ? if they do not know either 
way has possibilities. It might be difficult to 
tell when students are guessing and when 
they actually know the information. Per- 
haps having more false options would help 
in this respect. This test also has a weakness 
similar to multiple-choice tests, in that plau- 
sible false options are difficult to write. In 
spite of these problems, the main reason for 
presenting this test is to show that existing 
testing techniques can be creatively adapted 
to measuredepth of vocabulary knowledge. 



Conclusion 

Teachers will always be interested in vo- 



cabulary size and how many words stu- 
dents learn from a course or unit of study. 
For this reason, tests which measure 
vocabulary size will remain important. 
However, there is also likely to be a growing 
interest in measuring how well those words 
are learned. We are now only at the 
beginning stage in the development of depth 
tests, as indicated by the weaknesses of the 
above examples. As better depth tests are 
devised, we are likely to see hybrid vocabu- 
lary tests, where size tests are supplemented 
with depth components to give a broader 
indication of a learner's lexical capabilities. 
It is hoped that the example tests briefly 
examined in this paper will suggest new 
ways of looking at vocabulary testing to 
English teachers and that the development 
questions discussed will give them a 
principled way of writing their tests in the 
future. 



Notes 

1. The EFL Vocabulary Tests are avail- 
able from: Centre for Applied 
Language Studies, University Col- 
lege, Swansea SA2 8PP, United 
Kingdom. 

2. The EVST software is available from: 
Eurocentres Learning Service, 
Seestrasse 247, CH-8038, Zurich, 
Switzerland. 
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